Pruning Training Samples Using a Supervised Clustering Algorithm

نویسندگان

  • Minzhang Huang
  • Hai Zhao
  • Bao-Liang Lu
چکیده

As practical pattern classification tasks are often very-large scale and serious imbalance such as patent classification, using traditional pattern classification techniques in a plain way to deal with these tasks has shown inefficient and ineffective. In this paper, a supervised clustering algorithm based on min-max modular network with Gaussian-zero-crossing function is adopted to prune training samples in order to reduce training time and improve generalization accuracy. The effectiveness of the proposed training sample pruning method is verified on a group of real patent classification tasks by using support vector machines and nearest neighbor algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...

متن کامل

Design and Implementation of Binary Neural Network Learning with Fuzzy Clustering

In this paper, Design and Implementation of Binary Neural Network Learning with Fuzzy Clustering (DIBNNFC), is proposed to classify semisupervised data, it is based on the concept of binary neural network and geometrical expansion. Parameters are updated according to the geometrical location of the training samples in the input space, and each sample in the training set is learned only once. It...

متن کامل

A Novel Weighted Semi-Supervised Clustering Algorithm and its Application in Image Segmentation

In this paper we propose a novel weighted semi-supervised clustering algorithm and then study on how to apply it in the problem of image segmentation. We explain how to obtain weights of the semi-supervised clustering algorithm using the number of unlabeled data samples and the number of data samples. After defining the data sample weights, the next task is to obtain the cluster labels by optim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010